Search CORE

152 research outputs found

Automatic summarization of online debates

Author: Aker A.
Bontcheva K.
Sanchan N.
Publication venue: 'Assoc. for Computational Linguistics Bulgaria'
Publication date: 15/08/2017
Field of study

Debate summarization is one of the novel and challenging research areas in automatic text summarization which has been largely unexplored. In this paper, we develop a debate summarization pipeline to summarize key topics which are discussed or argued in the two opposing sides of online debates. We view that the generation of debate summaries can be achieved by clustering, cluster labeling, and visualization. In our work, we investigate two different clustering approaches for the generation of the summaries. In the first approach, we generate the summaries by applying purely term-based clustering and cluster labeling. The second approach makes use of X-means for clustering and Mutual Information for labeling the clusters. Both approaches are driven by ontologies. We visualize the results using bar charts. We think that our results are a smooth entry for users aiming to receive the first impression about what is discussed within a debate topic containing waste number of argumentations

arXiv.org e-Print Archive

Crossref

White Rose Research Online

Simple open stance classification for rumour analysis

Author: Aker A.
Bontcheva K.
Derczynski L.
Publication venue: 'Assoc. for Computational Linguistics Bulgaria'
Publication date: 14/09/2017
Field of study

Stance classification determines the attitude, or stance, in a (typically short) text. The task has powerful applications, such as the detection of fake news or the automatic extraction of attitudes toward entities or events in the media. This paper describes a surprisingly simple and efficient classification approach to open stance classification in Twitter, for rumour and veracity classification. The approach profits from a novel set of automatically identifiable problem-specific features, which significantly boost classifier accuracy and achieve above state-of-the-art results on recent benchmark datasets. This calls into question the value of using complex sophisticated models for stance classification without first doing informed feature extraction

arXiv.org e-Print Archive

Crossref

White Rose Research Online

Helping crisis responders find the informative needle in the tweet haystack

Author: Bontcheva K.
Derczynski L.
Maynard D.
Meesters K.
Publication venue: International Association for Information Systems for Crisis Response and Management (ISCRAM)
Publication date: 01/01/2018
Field of study

Crisis responders are increasingly using social media, data and other digital sources of information to build a situational understanding of a crisis situation in order to design an effective response. However with the increased availability of such data, the challenge of identifying relevant information from it also increases. This paper presents a successful automatic approach to handling this problem. Messages are filtered for informativeness based on a definition of the concept drawn from prior research and crisis response experts. Informative messages are tagged for actionable data -- for example, people in need, threats to rescue efforts, changes in environment, and so on. In all, eight categories of actionability are identified. The two components -- informativeness and actionability classification -- are packaged together as an openly-available tool called Emina (Emergent Informativeness and Actionability)

arXiv.org e-Print Archive

White Rose Research Online

Tilburg University Repository

Understanding Human Preferences for Summary Designs in Online Debates Domain

Author: Aker A.
Bontcheva K.
Sanchan N.
Publication venue: 'Centro de Innovacion y Desarrollo Tecnologico en Computo'
Publication date: 31/07/2016
Field of study

Research on automatic text summarization has primarily focused on summarizing news, web pages, scientific papers, etc. While in some of these text genres, it is intuitively clear what constitutes a good summary, the issue is much less clear cut in social media scenarios like online debates, product reviews, etc., where summaries can be presented in many ways. As yet, there is no analysis about which summary representation is favored by readers. In this work, we empirically analyze this question and elicit readers’ preferences for the different designs of summaries for online debates. Seven possible summary designs in total were presented to 60 participants via an online study. Participants were asked to read and assign preference scores to each summary design. The results indicated that the combination of Chart Summary and Side-By-Side Summary is the most preferred summary design. This finding is important for future work in automatic text summarization of online debates

White Rose Research Online

Vindication, virtue and vitriol: A study of online engagement and abuse toward British MPs during the COVID-19 pandemic

Author: Bontcheva K.
Farrell T.
Gorrell G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/08/2020
Field of study

COVID-19 has given rise to a lot of malicious content online, including hate speech, online abuse, and misinformation. British MPs have also received abuse and hate on social media during this time. To understand and contextualise the level of abuse MPs receive, we consider how ministers use social media to communicate about the pandemic, and the citizen engagement that this generates. The focus of the paper is on a large-scale, mixed-methods study of abusive and antagonistic responses to UK politicians on Twitter, during the pandemic from early February to late May 2020. We find that pressing subjects such as financial concerns attract high levels of engagement, but not necessarily abusive dialogue. Rather, criticising authorities appears to attract higher levels of abuse during this period of the pandemic. In addition, communicating about subjects like racism and inequality may result in accusations of virtue signalling or pandering by some users. This work contributes to the wider understanding of abusive language online, in particular that which is directed at public officials

arXiv.org e-Print Archive

White Rose Research Online

Longitudinal Modeling of Social Media with Hawkes Process based on Users and Networks

Author: Bontcheva K
Cohn T
Lukasik M
Srijith P K
Publication venue
Publication date: 01/01/2017
Field of study

Online social networks provide a platform for sharing information at an unprecedented scale. Users generate information which propagates across the network resulting in information cascades. In this paper, we study the evolution of information cascades in Twitter using a point process model of user activity. We develop several Hawkes process models considering various properties including conversational structure, users’ connections and general features of users including the textual information, and show how they are helpful in modeling the social network activity. We consider low-rank embeddings of users and user features, and learn the features helpful in identifying the influence and susceptibility of users. Evaluation on Twitter data sets associated with civil unrest shows that incorporating richer properties improves the performance in predicting future activity of users and memes

Research Archive of Indian Institute of Technology Hyderabad

Social media and information overload : survey results

Author: Bontcheva K.
Gorrell G.
Wessels B.
Publication venue
Publication date
Field of study

A UK-based online questionnaire investigating aspects of usage of user-generated media (UGM), such as Facebook, LinkedIn and Twitter, attracted 587 participants. Results show a high degree of engagement with social networking media such as Facebook, and a significant engagement with other media such as professional media, microblogs and blogs. Participants who experience information overload are those who engage less frequently with the media, rather than those who have fewer posts to read. Professional users show different behaviours to social users. Microbloggers complain of information overload to the greatest extent. Two thirds of Twitter-users have felt that they receive too many posts, and over half of Twitter-users have felt the need for a tool to filter out the irrelevant posts. Generally speaking, participants express satisfaction with the media, though a significant minority express a range of concerns including information overload and privacy

White Rose Research Online

Getting More out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics.

Author: Bontcheva K.
Cunningham H.
Roberts A.
Tablan V.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2013
Field of study

This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/ outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors’ own group) who work in text processing for biomedicine and other areas. GATE is available online ,1. under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

White Rose Research Online

FigShare

Towards Detecting Rumours in Social Media

Author: Bontcheva K.
Liakata M.
Procter R.
Tolmie P.
Zubiaga A.
Publication venue
Publication date: 01/01/2015
Field of study

The spread of false rumours during emergencies can jeopardise the well-being of citizens as they are monitoring the stream of news from social media to stay abreast of the latest updates. In this paper, we describe the methodology we have developed within the PHEME project for the collection and sampling of conversational threads, as well as the tool we have developed to facilitate the annotation of these threads so as to identify rumourous ones. We describe the annotation task conducted on threads collected during the 2014 Ferguson unrest and we present and analyse our findings. Our results show that we can collect effectively social media rumours and identify multiple rumours associated with a range of stories that would have been hard to identify by relying on existing techniques that need manual input of rumour-specific keywords

arXiv.org e-Print Archive

CiteSeerX

White Rose Research Online